While inferring common actor states (such as position or velocity) is an important and well-explored task of the perception system aboard a self-driving vehicle (SDV), it may not always provide sufficient information to the SDV. This is especially true in the case of active emergency vehicles (EVs), where light-based signals also need to be captured to provide a full context. We consider this problem and propose a sequential methodology for the detection of active EVs, using an off-the-shelf CNN model operating at a frame level and a downstream smoother that accounts for the temporal aspect of flashing EV lights. We also explore model improvements through data augmentation and training with additional hard samples.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.
translated by 谷歌翻译
Deep neural networks have strong capabilities of memorizing the underlying training data, which can be a serious privacy concern. An effective solution to this problem is to train models with differential privacy, which provides rigorous privacy guarantees by injecting random noise to the gradients. This paper focuses on the scenario where sensitive data are distributed among multiple participants, who jointly train a model through federated learning (FL), using both secure multiparty computation (MPC) to ensure the confidentiality of each gradient update, and differential privacy to avoid data leakage in the resulting model. A major challenge in this setting is that common mechanisms for enforcing DP in deep learning, which inject real-valued noise, are fundamentally incompatible with MPC, which exchanges finite-field integers among the participants. Consequently, most existing DP mechanisms require rather high noise levels, leading to poor model utility. Motivated by this, we propose Skellam mixture mechanism (SMM), an approach to enforce DP on models built via FL. Compared to existing methods, SMM eliminates the assumption that the input gradients must be integer-valued, and, thus, reduces the amount of noise injected to preserve DP. Further, SMM allows tight privacy accounting due to the nice composition and sub-sampling properties of the Skellam distribution, which are key to accurate deep learning with DP. The theoretical analysis of SMM is highly non-trivial, especially considering (i) the complicated math of differentially private deep learning in general and (ii) the fact that the mixture of two Skellam distributions is rather complex, and to our knowledge, has not been studied in the DP literature. Extensive experiments on various practical settings demonstrate that SMM consistently and significantly outperforms existing solutions in terms of the utility of the resulting model.
translated by 谷歌翻译
Multi-instance learning (MIL) is a great paradigm for dealing with complex data and has achieved impressive achievements in a number of fields, including image classification, video anomaly detection, and far more. Each data sample is referred to as a bag containing several unlabeled instances, and the supervised information is only provided at the bag-level. The safety of MIL learners is concerning, though, as we can greatly fool them by introducing a few adversarial perturbations. This can be fatal in some cases, such as when users are unable to access desired images and criminals are attempting to trick surveillance cameras. In this paper, we design two adversarial perturbations to interpret the vulnerability of MIL methods. The first method can efficiently generate the bag-specific perturbation (called customized) with the aim of outsiding it from its original classification region. The second method builds on the first one by investigating the image-agnostic perturbation (called universal) that aims to affect all bags in a given data set and obtains some generalizability. We conduct various experiments to verify the performance of these two perturbations, and the results show that both of them can effectively fool MIL learners. We additionally propose a simple strategy to lessen the effects of adversarial perturbations. Source codes are available at https://github.com/InkiInki/MI-UAP.
translated by 谷歌翻译
We report the result of the first edition of the WMT shared task on Translation Suggestion (TS). The task aims to provide alternatives for specific words or phrases given the entire documents generated by machine translation (MT). It consists two sub-tasks, namely, the naive translation suggestion and translation suggestion with hints. The main difference is that some hints are provided in sub-task two, therefore, it is easier for the model to generate more accurate suggestions. For sub-task one, we provide the corpus for the language pairs English-German and English-Chinese. And only English-Chinese corpus is provided for the sub-task two. We received 92 submissions from 5 participating teams in sub-task one and 6 submissions for the sub-task 2, most of them covering all of the translation directions. We used the automatic metric BLEU for evaluating the performance of each submission.
translated by 谷歌翻译
Information Extraction (IE) aims to extract structured information from heterogeneous sources. IE from natural language texts include sub-tasks such as Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE). Most IE systems require comprehensive understandings of sentence structure, implied semantics, and domain knowledge to perform well; thus, IE tasks always need adequate external resources and annotations. However, it takes time and effort to obtain more human annotations. Low-Resource Information Extraction (LRIE) strives to use unsupervised data, reducing the required resources and human annotation. In practice, existing systems either utilize self-training schemes to generate pseudo labels that will cause the gradual drift problem, or leverage consistency regularization methods which inevitably possess confirmation bias. To alleviate confirmation bias due to the lack of feedback loops in existing LRIE learning paradigms, we develop a Gradient Imitation Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data, which can force pseudo-labeled data to achieve better optimization capabilities similar to labeled data. Based on how well the pseudo-labeled data imitates the instructive gradient descent direction obtained from labeled data, we design a reward to quantify the imitation process and bootstrap the optimization capability of pseudo-labeled data through trial and error. In addition to learning paradigms, GIRL is not limited to specific sub-tasks, and we leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings (semi-supervised IE and few-shot IE).
translated by 谷歌翻译
Starcraft II(SC2)对强化学习(RL)提出了巨大的挑战,其中主要困难包括巨大的状态空间,不同的动作空间和长期的视野。在这项工作中,我们研究了《星际争霸II》全长游戏的一系列RL技术。我们研究了涉及提取的宏观活动和神经网络的层次结构的层次RL方法。我们研究了课程转移培训程序,并在具有4个GPU和48个CPU线的单台计算机上训练代理。在64x64地图并使用限制性单元上,我们对内置AI的获胜率达到99%。通过课程转移学习算法和战斗模型的混合物,我们在最困难的非作战水平内置AI(7级)中获得了93%的胜利率。在本文的扩展版本中,我们改进了架构,以针对作弊水平训练代理商,并在8级,9级和10级AIS上达到胜利率,为96%,97%和94 %, 分别。我们的代码在https://github.com/liuruoze/hiernet-sc2上。为了为我们的工作以及研究和开源社区提供基线,我们将其复制了一个缩放版本的Mini-Alphastar(MAS)。 MAS的最新版本为1.07,可以在具有564个动作的原始动作空间上进行培训。它旨在通过使超参数可调节来在单个普通机器上进行训练。然后,我们使用相同的资源将我们的工作与MAS进行比较,并表明我们的方法更有效。迷你α的代码在https://github.com/liuruoze/mini-alphastar上。我们希望我们的研究能够阐明对SC2和其他大型游戏有效增强学习的未来研究。
translated by 谷歌翻译
图像介入已取得了显着的进步和启发的丰富方法,其中关键瓶颈被确定为如何实现具有语义上掩盖区域的高频结构和低频纹理信息。为此,深层模型具有强大的优势来捕捉它们,但对当地空间区域受到限制。在本文中,我们在全球范围内深入研究纹理和构造信息,以便很好地捕获图像插入图像的语义。与被困在独立本地补丁上的现有艺术相反,每个贴片的纹理信息都是从整个图像上的所有其他补丁中重建的,以匹配填充的信息,特别是掩盖区域上的结构信息。与用于图像插入的像素级别内的当前仅解码器变压器不同,我们的模型采用了与编码器和解码器配对的变压器管道。一方面,编码器通过自我发项模块捕获了图像跨图像的所有贴片的纹理语义相关性。另一方面,在解码器中,在掩盖区域上填充的贴片的解码器中,自适应贴片词汇是动态建立的。在此基础上,锚定在已知区域上的结构文本匹配的注意模块嫁给了这两个世界中最好的,以通过概率扩散过程进行渐进的介绍。我们的模型与时尚艺术是正交的,例如卷积神经网络(CNN),注意力和变压器模型,从纹理和结构信息的角度用于图像插入图像。基准的广泛实验验证了其优越性。我们的代码可在https://github.com/htyjers/dgts-inpainting上找到。
translated by 谷歌翻译
机器翻译(MT)的单词级质量估计(QE)旨在在不参考的情况下找出翻译句子中的潜在翻译错误。通常,关于文字级别量化宽松的传统作品旨在根据文章编辑工作来预测翻译质量,其中通过比较MT句子之间的单词来自动生成单词标签(“ OK”和“ BAD”)。通过翻译错误率(TER)工具包编辑的句子。虽然可以使用后编辑的工作来在一定程度上测量翻译质量,但我们发现它通常与人类对单词是否良好或翻译不良的判断相抵触。为了克服限制,我们首先创建了一个金色基准数据集,即\ emph {hjqe}(人类对质量估计的判断),专家翻译直接注释了对其判断的不良翻译单词。此外,为了进一步利用平行语料库,我们提出了使用两个标签校正策略的自我监督的预训练,即标记改进策略和基于树的注释策略,以使基于TER的人工量化量子ceper更接近\ emph {HJQE}。我们根据公开可用的WMT en-de和en-ZH Corpora进行实质性实验。结果不仅表明我们提出的数据集与人类的判断更加一致,而且还确认了提议的标签纠正策略的有效性。 。}
translated by 谷歌翻译